Sampling bias and logistic models
نویسنده
چکیده
In a regression model, the joint distribution for each finite sample of units is determined by a function px.y/ depending only on the list of covariate values xD .x.u1/,. . .,x.un// on the sampled units. No random sampling of units is involved. In biological work, random sampling is frequently unavoidable, in which case the joint distribution p.y,x/ depends on the sampling scheme. Regression models can be used for the study of dependence provided that the conditional distribution p.yjx/ for random samples agrees with px.y/ as determined by the regression model for a fixed sample having a non-random configuration x. The paper develops a model that avoids the concept of a fixed population of units, thereby forcing the sampling plan to be incorporated into the sampling distribution. For a quota sample having a predetermined covariate configuration x, the sampling distribution agrees with the standard logistic regression model with correlated components. For most natural sampling plans such as sequential or simple random sampling, the conditional distribution p.yjx/ is not the same as the regression distribution unless px.y/ has independent components. In this sense, most natural sampling schemes involving binary random-effects models are biased.The implications of this formulation for subject-specific and population-averaged procedures are explored.
منابع مشابه
Absent or undetected? Effects of non-detection of species occurrence on wildlife–habitat models
Presence–absence data are used widely in analysis of wildlife–habitat relationships. Failure to detect a species’ presence in an occupied habitat patch is a common sampling problem when the population size is small, individuals are difficult to sample, or sampling effort is limited. In this paper, the influence of non-detection of occurrence on parameter estimates of logistic regression models ...
متن کاملSampling Bias and Class Imbalance in Maximum-likelihood Logistic Regression
Logistic regression is a widely used statistical method to relate a binary response variable to a set of explanatory variables and maximum likelihood is the most commonly used method for parameter estimation. A maximum-likelihood logistic regression (MLLR) model predicts the probability of the event from binary data defining the event. Currently, MLLR models are used in a myriad of fields inclu...
متن کاملPrediction of unwanted pregnancies using logistic regression, probit regression and discriminant analysis
Background: Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. Methods : In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were ...
متن کاملEstimating Population Abundance Using Sightability Models: R SightabilityModel package
This introduction to the R SightabilityModel package is a slight modification of Fieberg (2012), published in the Journal of Statistical Software. Sightability models are binary logistic-regression models used to estimate and adjust for visibility bias in wildlifepopulation surveys (Steinhorst and Samuel 1989). Estimation proceeds in 2 stages: 1) sightability trials are conducted with marked in...
متن کاملComparison of Gestational Diabetes Prediction Between Logistic Regression, Discriminant Analysis, Decision Tree and Artificial Neural Network Models
Background and Objectives: Gestational Diabetes Mellitus (GDM) is the most common metabolic disorder in pregnancy. In case of early detection, some of its complications can be prevented. The aim of this study was to investigate early prediction of GDM by logistic regression (LR), discriminant analysis (DA), decision tree (DT) and perceptron artificial neural network (ANN) and to compare these m...
متن کامل